2.2 Resampling

This section will introduce alternative resampling methods for finding a good balance between bias and variance for model evaluation and selection.

「モデルの評価と選択について、biasとvarianceの間のよい均衡を見つける、代わりとなるリサンプリング法を紹介する」

performance estimates may suffer from bias and variance, and we are interested in finding a good trade-off.

「汎化性能の推定はbiasとvarianceに悩まされるかもしれず、私たちはよいトレードオフを見つけることに関心がある」

resubstitution evaluation（訓練データで評価）は非常に楽観的なバイアスがかかる

1.3 Resubstitution Validation and the Holdout Method

データセットの大部分をテストセットとして保持するのは悲観的なバイアスにつながる

While reducing the size of the test set may decrease this pessimistic bias, the variance of a performance estimates will most likely increase.

テストセットのサイズを小さくすることは（訓練データを増やすので）悲観的なバイアスを減少させるが、汎化性能の見積りのvarianceは大きくなる」

normal approximationの分母のnが小さくなることなので、信頼区間は広がる（varianceは大きくなる）

Figure 3

1.2で統計的な意味と言っていたが、バイアスとバリアンスの話？

大きなテストセットを用意すると、モデルがfull capacityに到達しない（悲観的なバイアス）

Figure 4: MNISTのサブセットで検証

訓練セットが小さい＝テストセットが大きい

訓練セットが小さいとき、テストセットに対するaccuracyは小さい（訓練セットより大きく下がる）

two distinct trends：過学習していないことを表す

1. 「訓練セットのサンプル数が大きくなるにつれて、訓練セットに対する再置換accuracyは減少する」

the resubstitution accuracy (training set) declines as the number of training samples grows.

2. 「訓練セットのサイズが増加するにつれて、テストセットに対する汎化accuracyは改善する（＝増加する）」

we observe an improving generalization accuracy (test set) with an increasing training set size.

If the training set is small, the algorithm is more likely picking up noise in the training set so that the model fails to generalize well to data that it has not seen before.

「訓練セットが小さい場合、アルゴリズムは訓練セットのノイズをより拾いがちであり、そのためモデルは訓練で見ていないデータへの汎化に失敗する」

訓練セットが小さいと過学習しがち

悲観的なバイアスの説明でもある

Decreasing the size of the test set brings up another problem: It may result in a substantial variance of a model’s performance estimate.

（大きなテストセットを用意すると悲観的なバイアスになる点を見てきたが）「テストセットのサイズを小さくすると、モデルの汎化性能の見積もりでかなりのvarianceがある結果となる」

each time we resample a dataset, we alter the statistics of the distribution of the sample.

「データセットをリサンプルするたびに、私たちはサンプルの分布の統計を変えている」

1.4 Stratificationで層化を見た

However, the change in the underlying sample statistics along the features axes is still a problem that becomes more pronounced if we work with small datasets

「特徴量の軸(axes)に沿って横たわるサンプルの統計の変化は、依然として問題である」

📝 層化はクラスごとのサンプル割合を維持するが、特徴量で見たら統計情報が変化しているということ？

「小さなデータセットに取り組むならばより目立つ問題」

（感想：ホールドアウトは小さいデータセットにはオススメしないと理解）

Figure 5

n=100とn=1000で抽出したデータセット

分布が異なる

train : test = 70% : 30%で3回リサンプリング